Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
ثبت نشده
چکیده
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniques to produce a syntactically motivated bilingual synchronous grammar. We describe refinements to a chart based decoder and k-best extraction techniques to effectively parse the resulting grammar, which contains up to 4000 syntax-derivated nonterminals, producing translations that achieve significant improvements over Pharaoh, a stateof-the-art phrase based system, on the Europarl French-to-English task (Koehn and Monz, 2005).
منابع مشابه
Syntax Augmented Machine Translation via Chart Parsing with Integrated Language Modeling
We present a hierarchical phrase-based translation model which annotates and generalizes existing phrase translations with syntactic categories derived from parsing the target side of a parallel corpus. We associate target parse trees for each training sentence pair with a search lattice constructed from the existing phrase translations on the corresponding source sentence, and consider techniq...
متن کاملLattice Parsing to Integrate Speech Recognition and Rule-Based Machine Translation
In this paper, we present a novel approach to integrate speech recognition and rulebased machine translation by lattice parsing. The presented approach is hybrid in two senses. First, it combines structural and statistical methods for language modeling task. Second, it employs a chart parser which utilizes manually created syntax rules in addition to scores obtained after statistical processing...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملForest Rescoring: Faster Decoding with Integrated Language Models
Efficient decoding has been a fundamental problem in machine translation, especially with an integrated language model which is essential for achieving good translation quality. We develop faster approaches for this problem based on k-best parsing algorithms and demonstrate their effectiveness on both phrase-based and syntax-based MT systems. In both cases, our methods achieve significant speed...
متن کاملPost-ordering by Parsing for Japanese-English Statistical Machine Translation
Reordering is a difficult task in translating between widely different languages such as Japanese and English. We employ the postordering framework proposed by (Sudoh et al., 2011b) for Japanese to English translation and improve upon the reordering method. The existing post-ordering method reorders a sequence of target language words in a source language word order via SMT, while our method re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006